Adaptively Mapping Parallelism Based on System Workload Using Machine Learning

نویسنده

  • Dominik Grewe
چکیده

Parallel Computing has become pervasive and the number of processors placed in computers will further increase in the future. However, software developers are struggling to efficiently exploit the computational resources provided by parallel architectures. It is thus inevitable to investigate the behaviour of parallel programs and develop methods that help improving their performance. The software developer should not have to worry about how to map parallelism to the underlying architecture, but should instead concentrate on exposing the parallelism and leave the mapping task to the runtime system. In this project, the behaviour of parallel programs in the presence of workload is investigated. It is shown that choosing the right number of threads for an application is crucial to achieve the best performance possible when there is other workload running on the system. The default policy of creating as many threads as there are cores is rarely optimal in this situation and using the optimal number of threads reduces the runtime by 22.5% on average w.r.t. the default policy. Determining the optimal number of threads is not a straight-forward task, because it not only depends on the current workload but also on the program itself. For some programs, reducing the number of threads w.r.t. the default yields the optimal solution, whereas for other programs the best performance is achieved using more threads than there are cores. In order to tackle this problem, a novel technique for choosing the number of threads is presented in this work. Using Machine Learning techniques, a model is created that predicts the optimal number of threads based on the current system workload and the program. Different approaches for modelling this problem and several sets of features are evaluated. With the best model, a speedup of 92% of the optimal performance is achieved, which corresponds to a runtime reduction of almost 16% over the default policy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FlexPS: Flexible Parallelism Control in Parameter Server Architecture

As a general abstraction for coordinating the distributed storage and access of model parameters, the parameter server (PS) architecture enables distributed machine learning to handle large datasets and high dimensional models. Many systems, such as Parameter Server and Petuum, have been developed based on the PS architecture and widely used in practice. However, none of these systems supports ...

متن کامل

Learning Random Forests on the GPU

Random Forests are a popular and powerful machine learning technique, with several fast multi-core CPU implementations. Since many other machine learning methods have seen impressive speedups from GPU implementations, applying GPU acceleration to random forests seems like a natural fit. Previous attempts to use GPUs have relied on coarse-grained task parallelism and have yielded inconclusive or...

متن کامل

A Cost-Aware Parallel Workload Allocation Approach Based on Machine Learning Techniques

Parallelism is one of the main sources for performance improvement in modern computing environment, but the efficient exploitation of the available parallelism depends on a number of parameters. Determining the optimum number of threads for a given data parallel loop, for example, is a difficult problem and dependent on the specific parallel platform. This paper presents a learning-based approa...

متن کامل

Adaptive parallelism mapping in dynamic environments using machine learning

Modern day hardware platforms are parallel and diverse, ranging from mobiles to data centers. Mainstream parallel applications execute in the same system competing for resources. This resource contention may lead to drastic degradation in a program’s performance. In addition, the execution environment composed of workloads and hardware resources, is dynamic and unpredictable. Efficient matching...

متن کامل

Fault diagnosis in a distillation column using a support vector machine based classifier

Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009